{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Effect of the shrinkage factor in random over-sampling\n\nThis example shows the effect of the shrinkage factor used to generate the\nsmoothed bootstrap using the\n:class:`~imblearn.over_sampling.RandomOverSampler`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: MIT"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\nimport seaborn as sns\n\nsns.set_context(\"poster\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "First, we will generate a toy classification dataset with only few samples.\nThe ratio between the classes will be imbalanced.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from collections import Counter\nfrom sklearn.datasets import make_classification\n\nX, y = make_classification(\n    n_samples=100,\n    n_features=2,\n    n_redundant=0,\n    weights=[0.1, 0.9],\n    random_state=0,\n)\nCounter(y)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\nfig, ax = plt.subplots(figsize=(7, 7))\nscatter = plt.scatter(X[:, 0], X[:, 1], c=y, alpha=0.4)\nclass_legend = ax.legend(*scatter.legend_elements(), loc=\"lower left\", title=\"Classes\")\nax.add_artist(class_legend)\nax.set_xlabel(\"Feature #1\")\n_ = ax.set_ylabel(\"Feature #2\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we will use a :class:`~imblearn.over_sampling.RandomOverSampler` to\ngenerate a bootstrap for the minority class with as many samples as in the\nmajority class.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from imblearn.over_sampling import RandomOverSampler\n\nsampler = RandomOverSampler(random_state=0)\nX_res, y_res = sampler.fit_resample(X, y)\nCounter(y_res)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, ax = plt.subplots(figsize=(7, 7))\nscatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4)\nclass_legend = ax.legend(*scatter.legend_elements(), loc=\"lower left\", title=\"Classes\")\nax.add_artist(class_legend)\nax.set_xlabel(\"Feature #1\")\n_ = ax.set_ylabel(\"Feature #2\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We observe that the minority samples are less transparent than the samples\nfrom the majority class. Indeed, it is due to the fact that these samples\nof the minority class are repeated during the bootstrap generation.\n\nWe can set `shrinkage` to a floating value to add a small perturbation to the\nsamples created and therefore create a smoothed bootstrap.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sampler = RandomOverSampler(shrinkage=1, random_state=0)\nX_res, y_res = sampler.fit_resample(X, y)\nCounter(y_res)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, ax = plt.subplots(figsize=(7, 7))\nscatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4)\nclass_legend = ax.legend(*scatter.legend_elements(), loc=\"lower left\", title=\"Classes\")\nax.add_artist(class_legend)\nax.set_xlabel(\"Feature #1\")\n_ = ax.set_ylabel(\"Feature #2\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In this case, we see that the samples in the minority class are not\noverlapping anymore due to the added noise.\n\nThe parameter `shrinkage` allows to add more or less perturbation. Let's\nadd more perturbation when generating the smoothed bootstrap.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sampler = RandomOverSampler(shrinkage=3, random_state=0)\nX_res, y_res = sampler.fit_resample(X, y)\nCounter(y_res)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, ax = plt.subplots(figsize=(7, 7))\nscatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4)\nclass_legend = ax.legend(*scatter.legend_elements(), loc=\"lower left\", title=\"Classes\")\nax.add_artist(class_legend)\nax.set_xlabel(\"Feature #1\")\n_ = ax.set_ylabel(\"Feature #2\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Increasing the value of `shrinkage` will disperse the new samples. Forcing\nthe shrinkage to 0 will be equivalent to generating a normal bootstrap.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sampler = RandomOverSampler(shrinkage=0, random_state=0)\nX_res, y_res = sampler.fit_resample(X, y)\nCounter(y_res)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, ax = plt.subplots(figsize=(7, 7))\nscatter = plt.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.4)\nclass_legend = ax.legend(*scatter.legend_elements(), loc=\"lower left\", title=\"Classes\")\nax.add_artist(class_legend)\nax.set_xlabel(\"Feature #1\")\n_ = ax.set_ylabel(\"Feature #2\")\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Therefore, the `shrinkage` is handy to manually tune the dispersion of the\nnew samples.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}